{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Upgrade an index to use ELSER model\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/model-upgrades/upgrading-index-to-use-elser.ipynb)\n",
    "\n",
    "In this notebook we will see example on how to upgrade your index to ELSER model `.elser_model_2` using [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex). \n",
    "\n",
    "**Note:** Alternatively, you could also [Update by query](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html) to update index in place to use ELSER. In this notebook, we will see examples on using Reindex API. \n",
    "\n",
    "\n",
    "Scenerios that we will see in this notebook:\n",
    "\n",
    "1. Migrating a index which hasn't  generated [`text_expansion`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-text-expansion-query.html) field to  ELSER model `.elser_model_2` \n",
    "2. Upgrade an existing index with `.elser_model_1` to use `.elser_model_2` model\n",
    "3. Upgrade a index which use different model to use ELSER"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Install and Connect\n",
    "\n",
    "To get started, we'll need to connect to our Elastic deployment using the Python client.\n",
    "Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.\n",
    "First we need to `pip` install the following packages:\n",
    "\n",
    "- `elasticsearch`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install \"elasticsearch<9\" -qU"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will import all the modules that we need. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from elasticsearch import Elasticsearch, helpers\n",
    "from urllib.request import urlopen\n",
    "from getpass import getpass\n",
    "import json\n",
    "import time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we will instantiate the Python Elasticsearch client. First we prompt for  password and Cloud ID.\n",
    "\n",
    "Then we create a `client` object that instantiates an instance of the `Elasticsearch` class.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Elastic Cloud ID:  ········\n",
      "Elastic Api Key:  ········\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'name': 'instance-0000000001', 'cluster_name': 'ad402eb9a59041458b8edfc021e91caf', 'cluster_uuid': 'ks_HfcCdSf2qrcKZQsk9Lg', 'version': {'number': '8.11.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'd9ec3fa628c7b0ba3d25692e277ba26814820b20', 'build_date': '2023-11-04T10:04:57.184859352Z', 'build_snapshot': False, 'lucene_version': '9.8.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}\n"
     ]
    }
   ],
   "source": [
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
    "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
    "\n",
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
    "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
    "\n",
    "# Create the client instance\n",
    "client = Elasticsearch(\n",
    "    cloud_id=ELASTIC_CLOUD_ID,\n",
    "    api_key=ELASTIC_API_KEY,\n",
    ")\n",
    "\n",
    "print(client.info())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Download and Deploy ELSER v2 Model\n",
    "\n",
    "Before we begin, we have to download and deploy ELSER model `.elser_model_2`. \n",
    "\n",
    "Follow the instructions under the section [Download and Deploy ELSER Model](../search/03-ELSER.ipynb#download-and-deploy-elser-model)  from the [ELSER](../search/03-ELSER.ipynb) notebook \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  Case 1: Migrate an index with no `text_expansion` field\n",
    "\n",
    "In this case we will see how to upgrade an index which has a [ingestion pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html) configured, to use ELSER model `elser_model_2` "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create Ingestion pipeline with lowercase\n",
    "\n",
    "We will create a simple pipeline to convert title field values to lowercase and use this ingestion pipeline on our index. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True})"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.ingest.put_pipeline(\n",
    "    id=\"ingest-pipeline-lowercase\",\n",
    "    description=\"Ingest pipeline to change title to lowercase\",\n",
    "    processors=[{\"lowercase\": {\"field\": \"title\"}}],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create index - `movies` with mappings\n",
    "\n",
    "Next, we will create a index with pipeline `ingest-pipeline-lowercase` that we created in previous step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'movies'})"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.indices.delete(index=\"movies\", ignore_unavailable=True)\n",
    "client.indices.create(\n",
    "    index=\"movies\",\n",
    "    settings={\n",
    "        \"index\": {\n",
    "            \"number_of_shards\": 1,\n",
    "            \"number_of_replicas\": 1,\n",
    "            \"default_pipeline\": \"ingest-pipeline-lowercase\",\n",
    "        }\n",
    "    },\n",
    "    mappings={\n",
    "        \"properties\": {\n",
    "            \"plot\": {\n",
    "                \"type\": \"text\",\n",
    "                \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n",
    "            },\n",
    "        }\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Insert Documents\n",
    "we are now ready to insert sample dataset of 12 movies to our index `movies`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done indexing documents into `movies` index!\n"
     ]
    }
   ],
   "source": [
    "url = \"https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/notebooks/search/movies.json\"\n",
    "response = urlopen(url)\n",
    "\n",
    "# Load the response data into a JSON object\n",
    "data_json = json.loads(response.read())\n",
    "\n",
    "# Prepare the documents to be indexed\n",
    "documents = []\n",
    "for doc in data_json:\n",
    "    documents.append(\n",
    "        {\n",
    "            \"_index\": \"movies\",\n",
    "            \"_source\": doc,\n",
    "        }\n",
    "    )\n",
    "\n",
    "# Use helpers.bulk to index\n",
    "helpers.bulk(client, documents)\n",
    "\n",
    "time.sleep(5)\n",
    "print(\"Done indexing documents into `movies` index!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Upgrade index `movies` to use ELSER model\n",
    "\n",
    "we are ready to re-index  `movies` to a new index with the ELSER model `.elser_model_2`. As a first step, we have to create new ingestion pipeline and index to use ELSER model. \n",
    "\n",
    "# Create a new pipeline with ELSER \n",
    "Let's create a new ingestion pipeline with ELSER model `.elser_model_2`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True})"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.ingest.put_pipeline(\n",
    "    id=\"elser-ingest-pipeline\",\n",
    "    description=\"Ingest pipeline for ELSER\",\n",
    "    processors=[\n",
    "        {\n",
    "            \"inference\": {\n",
    "                \"model_id\": \".elser_model_2\",\n",
    "                \"input_output\": [\n",
    "                    {\"input_field\": \"plot\", \"output_field\": \"plot_embedding\"}\n",
    "                ],\n",
    "            }\n",
    "        }\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create a index with mappings\n",
    "\n",
    "Next, create an index with required mappings for ELSER.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-movies'})"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.indices.delete(index=\"elser-movies\", ignore_unavailable=True)\n",
    "client.indices.create(\n",
    "    index=\"elser-movies\",\n",
    "    mappings={\n",
    "        \"properties\": {\n",
    "            \"plot\": {\n",
    "                \"type\": \"text\",\n",
    "                \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n",
    "            },\n",
    "            \"plot_embedding\": {\"type\": \"sparse_vector\"},\n",
    "        }\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:**\n",
    "- `plot_embedding` is the name of the field that contains generated token with the type [`sparse_vector`](https://www.elastic.co/guide/en/elasticsearch/reference/master/sparse-vector.html) \n",
    "- `plot` is the name of the field from which the [`sparse_vector`](https://www.elastic.co/guide/en/elasticsearch/reference/master/sparse-vector.html)  are created. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reindex with updated pipeline \n",
    "\n",
    "With the help of [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex), we can copy data from old index `movies` and to new index `elser-movies` with  ingestion pipeline set to `elser-ingest-pipeline` .  On success, the index `elser-movies` creates tokens on the `text_expansion` terms that you targeted for ELSER inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "client.reindex(\n",
    "    source={\"index\": \"movies\"},\n",
    "    dest={\"index\": \"elser-movies\", \"pipeline\": \"elser-ingest-pipeline\"},\n",
    ")\n",
    "time.sleep(7)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once reindex is complete, inspect any document in the index `elser-movies` and notice that the document has a additional field `plot_embedding` with terms that we will be using in `text_expansion` query. \n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Querying documents with ELSER \n",
    "\n",
    "Let's try a semantic search on our index with ELSER model `.elser_model_2`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 6.403748\n",
      "Title: se7en\n",
      "Plot: Two detectives, a rookie and a veteran, hunt a serial killer who uses the seven deadly sins as his motives.\n",
      "\n",
      "Score: 3.6703482\n",
      "Title: the departed\n",
      "Plot: An undercover cop and a mole in the police attempt to identify each other while infiltrating an Irish gang in South Boston.\n",
      "\n",
      "Score: 2.9359207\n",
      "Title: the usual suspects\n",
      "Plot: A sole survivor tells of the twisty events leading up to a horrific gun battle on a boat, which began when five criminals met at a seemingly random police lineup.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "response = client.search(\n",
    "    index=\"elser-movies\",\n",
    "    size=3,\n",
    "    query={\n",
    "        \"text_expansion\": {\n",
    "            \"plot_embedding\": {\n",
    "                \"model_id\": \".elser_model_2\",\n",
    "                \"model_text\": \"investigation\",\n",
    "            }\n",
    "        }\n",
    "    },\n",
    ")\n",
    "\n",
    "for hit in response[\"hits\"][\"hits\"]:\n",
    "    doc_id = hit[\"_id\"]\n",
    "    score = hit[\"_score\"]\n",
    "    title = hit[\"_source\"][\"title\"]\n",
    "    plot = hit[\"_source\"][\"plot\"]\n",
    "    print(f\"Score: {score}\\nTitle: {title}\\nPlot: {plot}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Case 2: Upgrade index with ELSER model to `.elser_model_2`\n",
    "\n",
    "If you already have a index with ELSER model `.elser_model_1` and would like to upgrade to `.elser_model_2`, you can use the Reindex API with ingestion pipeline to use ELSER `.elser_model_2` model.\n",
    "\n",
    "**`Note:`** Before we begin, ensure that you are on Elasticsearch 8.11 version and ELSER model `.elser_model_2` is deployed. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create a new ingestion pipeline\n",
    "\n",
    "We will create a pipeline with `.elser_model_2` to enable us with reindexing. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True})"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.ingest.put_pipeline(\n",
    "    id=\"elser-pipeline-upgrade-demo\",\n",
    "    description=\"Ingest pipeline for ELSER upgrade demo\",\n",
    "    processors=[\n",
    "        {\n",
    "            \"inference\": {\n",
    "                \"model_id\": \".elser_model_2\",\n",
    "                \"input_output\": [\n",
    "                    {\"input_field\": \"title\", \"output_field\": \"title_embedding\"}\n",
    "                ],\n",
    "            }\n",
    "        }\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create a new index with mappings\n",
    "We will create  a new index with required mappings supporting ELSER"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-upgrade-index-demo'})"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.indices.delete(index=\"elser-upgrade-index-demo\", ignore_unavailable=True)\n",
    "client.indices.create(\n",
    "    index=\"elser-upgrade-index-demo\",\n",
    "    mappings={\n",
    "        \"properties\": {\n",
    "            \"title\": {\n",
    "                \"type\": \"text\",\n",
    "                \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n",
    "            },\n",
    "            \"title_embedding\": {\"type\": \"sparse_vector\"},\n",
    "        }\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use Reindex API\n",
    "we will use [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex) to move data from old index to new index `elser-upgrade-index-demo`. We will be excluding target field from old index and instead generate new tokens in the field `plot_embedding` with `.elser_model_2` while reindexing. \n",
    "\n",
    "**`Note:`** Make sure to replace `my-index` with your index name that you intend to upgrade and the field `my-tokens-field` with the field name that you have generated tokens previously.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "client.reindex(\n",
    "    source={\n",
    "        \"index\": \"books\",  # replace with your index name\n",
    "        \"_source\": {\n",
    "            \"excludes\": [\n",
    "                \"title_vector\"\n",
    "            ]  # replace with the field-name from your index, that has previously generated tokens\n",
    "        },\n",
    "    },\n",
    "    dest={\n",
    "        \"index\": \"elser-upgrade-index-demo\",\n",
    "        \"pipeline\": \"elser-pipeline-upgrade-demo\",\n",
    "    },\n",
    ")\n",
    "time.sleep(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Querying your data\n",
    "\n",
    "Once reindexing is complete, you are ready to query on your data and perform semantic search "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 14.755971\n",
      "Title: Python Crash Course\n",
      "Plot: Python Crash Course\n",
      "\n",
      "Score: 14.168372\n",
      "Title: The Pragmatic Programmer: Your Journey to Mastery\n",
      "Plot: The Pragmatic Programmer: Your Journey to Mastery\n",
      "\n",
      "Score: 11.704832\n",
      "Title: The Clean Coder: A Code of Conduct for Professional Programmers\n",
      "Plot: The Clean Coder: A Code of Conduct for Professional Programmers\n",
      "\n"
     ]
    }
   ],
   "source": [
    "response = client.search(\n",
    "    index=\"elser-upgrade-index-demo\",\n",
    "    size=3,\n",
    "    query={\n",
    "        \"text_expansion\": {\n",
    "            \"title_embedding\": {\n",
    "                \"model_id\": \".elser_model_2\",\n",
    "                \"model_text\": \"Programming Course\",\n",
    "            }\n",
    "        }\n",
    "    },\n",
    ")\n",
    "\n",
    "for hit in response[\"hits\"][\"hits\"]:\n",
    "    doc_id = hit[\"_id\"]\n",
    "    score = hit[\"_score\"]\n",
    "    title = hit[\"_source\"][\"title\"]\n",
    "    plot = hit[\"_source\"][\"title\"]\n",
    "    print(f\"Score: {score}\\nTitle: {title}\\nPlot: {plot}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Case 3: Upgrade a index with different model to ELSER\n",
    "\n",
    "Now we will see how to move your index which already has generated `embedding` using a different model. \n",
    "\n",
    "Lets consider the index - `books` and has generated `title_vector` using the NLP model `sentence-transformers__all-minilm-l6-v2`. In case you would like know about more how to load a NLP model to an index, follow the steps from our notebook [loading-model-from-hugging-face.ipynb](../integrations/hugging-face/loading-model-from-hugging-face.ipynb)\n",
    "\n",
    "Follow similiar proceedure that we did in previously: \n",
    "1. Create a ingestion pipeline with ELSER model `.elser_model_2`\n",
    "2. Create a index with mappings, with the pipeline we created in the previous step. \n",
    "3. Reindex, excluding the field that has embedding from the `books` index\n",
    "\n",
    "Before we begin, lets take a look at our index `books` and see the mappings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'books': {'aliases': {}, 'mappings': {'properties': {'authors': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'num_reviews': {'type': 'long'}, 'publish_date': {'type': 'date'}, 'publisher': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'summary': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title_vector': {'type': 'dense_vector', 'dims': 384, 'index': True, 'similarity': 'cosine'}}}, 'settings': {'index': {'routing': {'allocation': {'include': {'_tier_preference': 'data_content'}}}, 'number_of_shards': '1', 'provided_name': 'books', 'creation_date': '1706118077023', 'number_of_replicas': '1', 'uuid': 'GxGfG_LtSBOIXsB-5bF2_A', 'version': {'created': '8500003'}}}}})"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.indices.get(index=\"books\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice the field `title_vector`, We will exclude this field in our new index and generate new mapping against the field `title` from the `books` index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create ingestion pipeline\n",
    "\n",
    "Next, we will create a pipeline using ELSER model `.elser_model_2`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True})"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.ingest.put_pipeline(\n",
    "    id=\"elser-pipeline-books\",\n",
    "    description=\"Ingest pipeline for ELSER upgrade\",\n",
    "    processors=[\n",
    "        {\n",
    "            \"inference\": {\n",
    "                \"model_id\": \".elser_model_2\",\n",
    "                \"input_output\": [\n",
    "                    {\"input_field\": \"title\", \"output_field\": \"title_embedding\"}\n",
    "                ],\n",
    "            }\n",
    "        }\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create index with mappings\n",
    "\n",
    "Lets create a index `elser-books` with mappings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-books'})"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "client.indices.delete(index=\"elser-books\", ignore_unavailable=True)\n",
    "client.indices.create(\n",
    "    index=\"elser-books\",\n",
    "    mappings={\n",
    "        \"properties\": {\n",
    "            \"title\": {\n",
    "                \"type\": \"text\",\n",
    "                \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n",
    "            },\n",
    "            \"title_embedding\": {\"type\": \"sparse_vector\"},\n",
    "        }\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reindex API\n",
    "\n",
    "we will use the [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex) to copy data and generate `text_expansion` embedding to our new index `elser-books`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "client.reindex(\n",
    "    source={\"index\": \"books\", \"_source\": {\"excludes\": [\"title_vector\"]}},\n",
    "    dest={\"index\": \"elser-books\", \"pipeline\": \"elser-pipeline-books\"},\n",
    ")\n",
    "time.sleep(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Querying your data\n",
    "Success! Now we can query data on the index `elser-books`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 22.333044\n",
      "Title: Python Crash Course\n",
      "Score: 9.364547\n",
      "Title: The Pragmatic Programmer: Your Journey to Mastery\n",
      "Score: 8.410445\n",
      "Title: Clean Code: A Handbook of Agile Software Craftsmanship\n"
     ]
    }
   ],
   "source": [
    "response = client.search(\n",
    "    index=\"elser-books\",\n",
    "    size=3,\n",
    "    query={\n",
    "        \"text_expansion\": {\n",
    "            \"title_embedding\": {\n",
    "                \"model_id\": \".elser_model_2\",\n",
    "                \"model_text\": \"Python tutorial\",\n",
    "            }\n",
    "        }\n",
    "    },\n",
    ")\n",
    "\n",
    "for hit in response[\"hits\"][\"hits\"]:\n",
    "    doc_id = hit[\"_id\"]\n",
    "    score = hit[\"_score\"]\n",
    "    title = hit[\"_source\"][\"title\"]\n",
    "    print(f\"Score: {score}\\nTitle: {title}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
